LOCAL MOMENT ESTIMATION OF POPULATION SPECTRUM 



1 



A LOCAL MOMENT ESTIMATOR OF THE SPECTRUM 
OF A LARGE DIMENSIONAL COVARIANCE MATRIX 



Abstract: This paper considers the problem of estimating the population spectral 
distribution from a sample covariance matrix in large dimensional situations. We 
generalize the contour-integral based method in Mestre (2008) and present a local 
moment estimation procedure. Compared with the original one, the new proce- 
dure can be applied successfully to models where the asymptotic clusters of sample 
eigenvalues generated by different population eigenvalues are not all separate. The 
proposed estimates are proved to be consistent. Numerical results illustrate the im- 
plementation of the estimation procedure and demonstrate its efficiency in various 
cases. 

Key words and phrases: Empirical spectral distribution, Large covariance matrix. 
Moment estimation, Population spectral distribution, Stieltjes transform. 

1. Introduction 

Let xi, . . . , x„ be a sequence of i.i.d. zero-mean random vectors in W or C^, 
witli a common population covariance matrix Ep. When the population size p is 
not negligible with respect to the sample size n, modern random matrix theory 
indicates that the sample covariance matrix Sn = l^"=i Xi^*/n does not approach 
Sp. Therefore, classical statistical procedures based on an approximation of Sp 
by Sn become inconsistent in such large dimensional situations. 

More precisely, the spectral distribution (SD) of an m x m Hermitian 
matrix (or real symmetric) A is the measure generated by its eigenvalues {A^}, 



where db denotes the Dirac point measure at b. Denote by (cTi)i<j<p the p eigen- 
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values of Sp. We are particularly interested in the following SD 

^ i=i 

In large dimensional frameworks, both dimensions p and n will grow to infinity. 
It is then natural to assume that Hp converges weakly to a limit H. Both the 
SD Hp and its limit H are referred as the population spectral distribution (PSD) 
of the observation model. 

The main observation is that for large dimensional data, the empirical SD 
(ESD) Fn := -F'^" of Sn is far from the PSD Hp. Indeed, under reasonable as- 
sumptions, when both dimensions p and n grow proportionally, almost surely, the 
ESD Fn will weakly converge to a deterministic distribution F, which in general 
has no explicit form but is linked to the PSD H via the so-called Marcenko- 
Pastur equation, see Marcenko and Pastur (1967); Silverstein (1995); Silverstein 
and Bai (1995), and Section 2.1. 

A natural question here is the recovering of the PSD Hp (or its limit H) 
from the ESD F„. This question has a central importance in several popular 
statistical methodologies like principal component analysis (Johnstone, 2001) or 
factor analysis that all rely on efficient estimations of some population covariance 
matrices. 

Recent works on this problem include El Karoui (2008), where the author 
proposed a nonparametric approach by solving the Marcenko-Pastur equation on 
the upper complex plane, and then obtained consistent estimates of H. Rao et 
al. (2008) investigated the asymptotic distributions of the moments of the ESD 
Fn and introduced a Gaussian likelihood to get consistent estimates of H. In the 
work of Mestre (2008), each mass of a discrete PSD H is represented by a contour 
integral under a certain eigenvalue splitting condition and consistent estimators 
of H are then obtained. Recently, Bai et al. (2010) modified the approach in Rao 
et al. (2008) and turned it to a fully moments based procedure. Moreover beyond 
consistency, the authors proved also a central limit theorem for the estimator. 
Li et al. (2012) synthesized both the optimization approach in El Karoui (2008) 
and the parametric setup in Bai et al. (2010), where an important improvement 
is that they changed the optimization problem from the complex plane to the 
real line by considering the extension of the Stieltjes transform on the real line. 
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Among all the above contributions, the contour-integral based method in 
Mestre (2008) is well known for its high efficiency and easy computation. How- 
ever, the method is limited to a small class of discrete PSDs where, in addition, 
the imposed eigenvalue splitting condition states that distinct population eigen- 
values should generate non-overlapping clusters of sample eigenvalues. Note that 
this method has been recently employed for subspace estimation in a so-called 
"information plus noise" model in Hachem et al. (2011). 

Our purpose in this paper is to extend Mestre's method to a more general 
situation where the splitting condition may not be satisfied. For a discrete PSD 
H with finite support on M"*", it is always true that one separate interval of the 
support Sp of the limiting SD (LSD) F corresponds to only one atom of H if 
the dimension ratio c is close to zero (the splitting condition holds). When c is 
increased gradually, adjacent intervals of Sp become closer, and some of them 
may ultimately merge into a larger interval (the splitting condition fails). Such 
merged intervals thus corresponds to more than one atom of H, and establishing 
their relationship in such a situation gives birth to our local estimation method. 

Our strategy is that we first divide the PSD H into a number of sub- 
probability measures. Hi, ... , Hm, such that each Hi corresponds to one separate 
interval of Sp. Then, we develop a method to approximate the moments of Hi. 
An estimate of Hi can be obtained by solving a system of moment equations. 
Collecting all these estimates finally produces an estimator of H. It will be shown 
that when m is equal to the number of atoms of H (no merged intervals at all), 
this estimator reduces to the one in Mestre (2008); If in contrary m = 1 (all 
intervals merged into a single one), the estimator is equivalent to the one in Bai 
et al. (2010). 

The rest of the paper is organized as follows. In the next section, we review 
some useful results from Random Matrix Theory and introduce the division of a 
PSD H according to the separation of the corresponding LSD F. A fast algorithm 
to solve the associated moment equations is also given. In Section 3, we present 
the theoretical supports and the detailed procedure of our estimation. In Section 
4, simulation experiments are carried out to compare our new estimator with the 
estimator in Mestre (2008) and the moment estimator in Bai et al. (2010). Some 
conclusions and remarks are presented in Section 5. 
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2. Limiting spectral distribution and division of a PSD H 

2.1 The Marcenko-Pastur equation 

Recall that the Stieltjes transform of G, a measure supported on the real 
line, is defined as 

sg{z) = [ — *— dG(a;), z G C+, 
J X- z 

where C"*" is the set of complex numbers with positive imaginary part. 

Let Sg be the support set of G and Sq its complementary set. For the 
developments in this paper, we need to extend the Stieltjes transform to C \ 5g 
by 

'(z*) {z£C~ ={z£C: < 0}), 

lim£_i.o+ s{x + ei) (z = x G M \ Sg), 

where a* denotes the complex conjugate of a. The existence of the limit in the 
second term follows from the dominated convergence theorem. 

Denote by Ai < • • • < Ap the eigenvalues of the sample covariance matrix Sn- 
Then the ESD Fn of Sn is 

1 ^ 



s{z) 



^ i=l 

whose Stieltjes transform is 



^n{z) = [ —dFn{x) = -Y.Y^ 
J X- z \- 



Next, we present a convergence result of Fn in Silverstein (1995) which is 
the basis of our estimation method in the next section. 

Lemma 1. Suppose that the entries of Xn{p x n) are complex random variables 
which are independent for each n and identically distributed for all n, and satisfy 
E(2;ii) = anc/ E(|xiip) = 1. Also, assume thatTn is apxp random Hermitian 
nonnegative definite matrix, independent of Xn, and the empirical distribution 
F^" converges almost surely to a probability measure H on [0,oo) as n ^ oo. 

1/2 1/2 

Set Bn = Tn XnX*Tn jn. When p = p{n) with p/n— )-c>0 as n ^ oo, then, 
almost surely, the empirical spectral distribution converges in distribution, 
as n ^ oo, to a (non-random) probability measure F, whose Stieltjes transform 
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s = s{z) is a solution to the equation 

1 



-dH{t). (2.1) 



t{l — c — czs) — z 
The solution is also unique in the set {sSC: — (1 — c)/z + csG C^}. 

It will be more convenient to use a companion distribution = {l—p/n)6o+ 
{p/n)Fn with Stiletjes transform 

z n z n ^-^ Ai — z 

1=1 

The corresponding limit is s{z) = —(1 — c)/z + cs{z) and it satisfies the following 
important equation which is a variant of Equation (2.1), 

Both Equation (2.1) and Equation (2.2) are referred as the Marcenko-Pastur 
equation. 

Since the convergence in distribution of probability measures implies the 
pointwise convergence of the associated Stieltjes transforms, by Lemma 1, s.^{z) 
converges to s{z) almost surely, for any z G C\M. In Silverstein and Choi (1995), 
the convergence is extended to S'i;'\{0}, and thus we conclude that for sufficiently 
large n, Sni^) converges to s{z) almost surely for every z £ C \ (Sp U {0}). 

2.2 Division of a PSD H 

As mentioned in Introduction, our new method relies on a division of a 
PSD H according to the separation of the corresponding LSD F. Suppose that 
the support Sp of F consists of m (m > 1) disjoint compact intervals, Si = 
[x^ , xf ],..., Sm = sorted in an increasing order. Choose {i = 

1, . . . , m) satisfying 

5]" < xj" < x| < < 5^ < • • • < 5^-1 < S~ < X- <x^< 5+. (2.3) 

Notice that when z = x is restricted to Sp, u{x) = —l/s{x) is monotonically 
increasing and takes values in 5*1^ (Silverstein and Choi, 1995). We have then 

u{S^) < u{6t) < u{6^) < ■ ■ ■< u((^+_i) < u{6-) < u{S+) 
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Figure 1: The curves of u{x) on S'p n M+ with Hi = 0.3(5i + 0.4,54 + OM5 and ci = 0.1 
(left), and H2 = 0.5,5i + 0.5(52 and C2 = 4 (right). 

and 

m 

SHc[j[ui6;)MSt)]- 

i=l 

Consequently, we can match each compact interval of Sp with a disjoint part of 
Sh by 

Si^ SHn[u{6r),u{d+)], i = l,...,m, (2.4) 
and hence, the PSD H admits a division as follows: 

H,{A)= dH, AeB, i = l,...,m, 

JHSr),u{S+)]nA 

where B is the class of Borel sets of M. Obviously, X^i^i ^« — ^■ 

The map in (2.4) can be easily found out from the graph of u{x) on Sp. Two 
typical representations of the graph are shown in Figure 1. The figures show that 
when c < 1, each compact interval of Sp corresponds to masses of H that fall 
within this interval. But this is not true when c > 1 as shown in the right panel 
of Figure 1 where the mass 1 falls outside the interval 

2.3 Moments of a discrete measure 

Let be a discrete measure G = J2i=i ''^^i^bi where hi < ■ ■ ■ < hk are k masses 
with respective positive weights {rrii}. Here, we don't assume ^m-j = 1 and G 
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can be a sub-probability measure. Define the l-th. moment of G as 

k 



ji = ^rnib[, 1 = 0,1,..., 



i=l 

and the A^-th Hankel matrix related to G as 

/ 70 71 

T{G,N) = 



71 72 



7N 



\7Af-l In ■■■ 12N-2/ 

Proposition 1. The Hankel matrix T[G, k) is positive definite, and its determi- 
nant is 



det{r{G,k)) = llm, H ih-bjf. 

i=l l<i<j<k 



Furthermore, 



det(r(G,iV)) = 0, N>k. 
Proof. Write M = diag(mi, . . . , m^) a diagonal matrix, and 



(2.5) 



(2.6) 



B 



( 1 1 

hi 62 



1 \ 

bk 



which is a square Vandermonde matrix whose determinant is well known to be 
Y[i<i<j<k(bj — i>i)- From this and the fact that T{G, k) = BMB^, we get Equa- 
tion (2.5). 

Based on the above conclusion, Equation (2.6) and the positive definiteness 
of r(G, k) can be verified by a direct calculation. □ 

Our aim here is to find an efficient inversion formula to these moment equa- 
tions and the formula will be on the basis of our inference procedure below. 
Define a degree-A; polynomial P{x) as 

k k 
P{x) = JJ(X -bi) = ^ CiX\ Ck = 1. 



i=l 



i=0 
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Then, the coefficients Cj's of P{x) and the moments 7i's of G have the following 
relationship. 

Proposition 2. Let c = (cq, . . . , Ck~i)' and 7 = (7/;, . . . , ^2k-i)' ■ Then, 

r(G, A;) • c + 7 = 0. 

Proof. It is easily verified. □ 

Propositions 1 and 2 establish a one-to-one map between the parameters of 
G and its moments. They further tell us that the masses of G are all zeros of 
P{x) with coefficients c = — (r(G, k))~^ • 7 and Ck = 1- As to the weights of G, 
they can be trivially obtained by solving linear equations, 

k 

^mi6' = 7;, l = 0,...,k-l, 
1=1 

with 5j's known. 
3. Estimation 

3.1 Model and estimation strategy 

We consider a class of discrete PSDs with finite support on M"*", that is, 

where 

Q = <6 = {ai,wi, . . . ,ak,Wk) : < ai < • • • < afc < 00; Wi > 0, ^Wi = l\. 
^ 1=1 J 

Here, the order k H is assumed known (when k is also to be estimated, a 

consistent estimator of k is given in Chen et al. (2011)). 

Suppose that the support Sp of the LSD F associated to H and c has m 
(1 < m < k) disjoint compact intervals. According to the discussion in Section 
2, H can be divided into m parts. Hi, . . . ,Hm, with Hi consisting of ki masses 
of H, ki > 1 and Y2^i ~ ^■ 

When fcj's are all known and equal to 1, the assumption reduces to the split 
case in Mestre (2008). By contrast, we consider that ki's are unknown, can be 
larger than 1, and are not necessarily equal. 

Our estimation strategy is the following: 
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1) determine the division of H according to the separation of clusters of sample 
eigenvalues; 

2) for each part Hi, obtain strongly consistent estimators of its moments; 

3) obtain a strongly consistent estimator k.„ of the partition {ki, . . . ,km) of 
numbers of masses in the m parts Hi, . . . , Hm', 

4) by combination of these estimators and using the method of moments, fi- 
nally obtain consistent estimators of all the weights and masses {wi, ai). 

Note that in the first step, an accurate division of H may not be always 
achieved, especially when sample sizes are relatively small. A solution to this 
problem will be given later. 

3.2 Estimation of the moments of Hi 

The following theorem re-expresses the moments of Hi by contour integrals 
related to the companion Stieltjes transform s{z). 

Theorem 1. Suppose the assumptions in Lemma 1 are fulfilled, then the l-th 
moment of Hi can he expressed as 

c2-KiJc^ s^{z) 

where Ci is a positively oriented contour described by the boundary of the rectangle 

{zGC:6r <R{z)<6+,\'^{z)\<l}, 
where 5^ (i = 1, . . . , m) are defined by (2.3) and 6^ < if c> I. 
Proof. Let the image of Ci under u{z) = l/s{z) be 

u{Ci) = {u{z) : z G Ci}. 

Notice that s{z) is holomorphic on Cj. Then, u{Ci) is a simple closed curve 
taking values on C \ {Sh U {0}). (The function u{z) = —l/s{z) is analytic on d 
and is a one-to-one map from Ci to its image u{Ci). Thus, the two curves Ci and 
u{Ci) are homeomorphic. Since Ci is simple and closed (homeomorphic to a unit 
circle in C), its image is also simple and closed.) Moreover, since Q{u{z)) ^ 
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for all z with / 0, we have u(Ci) fl M = {n((5r), u(b^y\ and u{Ci) encloses 
[n(5j^), u((5j^)]. Therefore, n(Cj) encloses only 5//^ and no other masses of H. 
Applying this change of variable to the right hand side of (3.1), we have 



(-1/-- / 



zi{z) 
c2tt\ Jc, s\z) 



dz 



1 1 
c 27ri 

1 1 

c 27ri 

1 
27ri 



z{u)v}' "^du 



( 

u{C\) 



tu 



l-l 



u — t 



dH{t)du 



tu 



l-l 



uid) 



u — t 



dudH{t) 



where the second equation is from the Marcenko-Pastur equation, and the last 
equation follows from the residue theorem. □ 

By substituting the empirical Stieltjes transform s^{z) for s{z) in (3.1), we 
get a natural estimator of ^i^f. 



dz, Z = l,2, .... 



(3.2) 



Theorem 2. Under the assumptions of Lemma 1, for each I (Z > 1), 7j^; con- 
verges almost surely to ^i^i. 

Proof. From the fact that for sufficiently large n, with probability one, there are 
no sample eigenvalues located outside Sp (Bai and Silverstein, 1998), we have 
then, for sufficiently large n, z^j^{z) / ^^{z) as well as zs^{z) / ^{z) are continuous 
on Cj, and thus bounded on the contour. By the convergence of s„(2:) and the 
dominated convergence theorem, almost surely. 



ZS [Z] 



< 



zs'{z) zs^^{z) 



Ci 



s\z) s^z) 



dz 
\dz\ 



0, n 



oo. 



□ 



A technical issue here is the contour integration in (3.2). It can be calculated 
by the residue theorem and an algorithm is described in Appendix. 
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3.3 Estimation of the partition {ki, . . . , km) 

Denote by k = (fci, . . . , km)' the vector of orders of Hi's, the collection of all 
possible values of k is 

m 

K = {k:ki>l, ^ki = k}. 

1=1 

Let ko = {ko^i, . . . , ko^m)' be the true value of k. From Proposition 1, we know 
that the smallest eigenvalue Aniin(r(-ffj, ^j)) the Hankel matrix T{Hi,ki) is 
positive if ki < ko^i, and otherwise 0. Based on this property, we construct the 
following objective function 

5t(k) = min{Amm(r(i?i, fcj)), i = l,...,m}, k G K, 

that satisfies 

g{ko) > and ^(k) = (k / ko). 
So, an estimator of kg can be obtained by maximizing the estimate of (7(k), i.e. 

k„ = arg max ^(k) 
keK 

= argmaxmini Amm(f(-f/'i,/ci)), i = l,...,m\, 

where T{Hi, ki) = (7i,r+s-2)i<r, s<ki with its entries defined by (3.2). 

Note that when evaluating the estimator k„, it is not necessary to compare 
^(k)'s at all k-points, but only at a small part of them. More precisely, for the 
i-th. element ki of k, in theory, its value may range from 1 to /c — m + 1 and its 
true value A;o,j makes T(Hi, ki) positive definite. This implies that if T{Hi, ki) is 
non-positive definite then ki ^ fco.i (actually ki > k^^i). Based on this knowledge, 
in practice, it is enough to consider ki that belongs to a set {1, . . . where 
di < k — m + 1 stands for the largest integer such that T{Hi,di) is positive 
definite. This technique can effectively reduce the computational burden when 
the cardinality of K is large. 

Theorem 3. Under the assumptions of Lemma 1, almost surely, 

k„ —7- kg, as n — )• oo. 

Proof. The conclusion follows from Theorem 2 and the fact that kg is the unique 
maximizer of the function fi'(k) on the finite set K. □ 
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3.4 Estimation of 9 

By Theorem 3 and since the partition set K is finite, almost surely, k„ = ko 
eventually. As far as the consistency is concerned for estimation of 0, we may 
assume in this section that the partition k is known without loss of general- 
ity. Then, the estimator On of 6 is defined to be a solution of the following 2k 
equations: 



where 7j^o = Vi/v^ i = 1, . . . (w is the total number of positive sample eigen- 
values and Vi is the number of those forming the i-t\i cluster). We call On the 
local moment estimator (LME) of 0, since it is obtained by the moments of -fTi's, 
rather than the moments of H . Accordingly, the LME of H is H = H{On)- When 
ki = ■ ■ ■ = km = 1, the LME reduces to the one in Mestre (2008). 

The solution of the moment equations (3.3) exists and is unique if the matri- 
ces T{Hi, /ci)'s are all positive definite. Moreover, a fast algorithm for the solution 
exists following the equations given in Section 2.3: indeed, the algorithm needs 
to solve a one- variable polynomial equation and a linear system. 

Next, we establish the strong consistency of the LME as follows. 

Theorem 4. In addition to the assumptions in Lemma 1, suppose that the true 
value of the parameter vector Oq is an inner point of Then, the LME On is 
strongly consistent: almost surely, 

Gq, n — )• oo. 

Proof. Write 0„, = (^in, . . . , Omn), where Oin is the LME of the parameter vector 
OiQ oi Hi (i = 1, . . . , m). It is sufficient to prove that, almost surely, 



for each i (i = 1, . . . , m). 

Let hi be the function R'^''^ — > R'^^^: 

^i^li = (7i,o,---,7i,2fc,-i)- 

Then the multivariate function hi is invertible from the conclusions of Proposi- 
tions 1 and 2. 




(3.3) 



Oin Oio, n-f OO, 
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Denote 7j„ = (7^,0, ■ ■ ■ ,li,2ki-i) and 7^0 = ^{610). By the convergence of 
7j„ (Theorem 2) and the impHcit function theorem, there exists a neighborhood 
Ui of OiQ and a neighborhood Vi of 7^0, such that hi is a differomorphism from 
Ui onto Fj. Moreover, 0j„ = /ij~^(7j„) € f/j exists almost surely for all large n. 
Therefore, 6 in converges to 6^) = h^^i'fiQ) almost surely, as n — )• 00. □ 

3.5 A generalization of the local moment estimator 

The proposed estimation procedure needs a good judgment on the separation 
of clusters of sample eigenvalues. This may be indeed a problem when two or 
more adjacent clusters are very close, which can happen when the sample size 
is too small. To handle this problem, we introduce here a generalized version 
of the estimation procedure. The resulting estimator is referred as generalized 
LME (GLME). 

Suppose that the support Sp has m (> 1) disjoint compact intervals, and ac- 
cordingly H gains a division of m parts: Hi, . . . , Hm- Without loss generality, we 
suppose that the first two clusters of sample eigenvalues have no clear separation 
under a situation of finite sample size. Our strategy to cope with this is simply 
to merge these two clusters into one and treat Hi and H2 as a whole. Then, 
the GMLE can be obtained by conducting a similar procedure of estimation as 
mentioned in Section 3.1. 

An extreme case of the GLME is to merge all clusters into one, then one may 
find with surprise that the GLME becomes a "full moment" estimator which is 
equivalent to the moment estimator in Bai et al. (2010). In this sense, the GLME 
encompasses this moment method. However, the merging procedure may result 
in a reduction of estimation efficiency, which will be illustrated numerically in 
the next section. 

On theoretical aspect, it can be easily shown that Theorems 1-3 still hold 
true after the merging procedure. We can therefore obtain the strong convergence 
of the GLME by a similar proof of Theorem 4. Hence these proofs are omitted. 

4. Simulation 

In this section, simulations are carried out to examine the performance of 
the proposed estimator comparing with the estimator in Mestre (2008) (referred 
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as ME), and the one in Bai et al. (2010) (referred as BCY). 

Samples are drawn from mean-zero normal distribution with {p, n) = (320, 
1000) for the estimation of H, and (p, n) = (320, 1000), (160, 500), (64, 200), 
(32, 100), (16, 50) for the estimation of the partition k of H. The independent 
replications are 1000. More p/n combinations are considered for the partition 
estimator k,i since this step has a primary importance on the overall performance 
of the procedure. 

In order to measure the distance between H and its estimate we consider 
the Wasserstein distance d = J {Quit) — Qg{t)\dt where Q^{t) is the quantile 
function of a probability measure ^. Execution times are also provided for one 
realization of H in seconds. All programs are realized in Mathematica 8 software, 
and run on a PC equipped with 3.5GHk CPU and 8GB physical RAM. 

We first consider a case in Mestre (2008) where H = 0.5(5i +0.25^7+0.125(^15+ 
0.125(^25 and c = 0.32. In this case, H has four atoms at 1, 7, 15, and 25, while the 
sample eigenvalues form three clusters, and spread over Sp = [0.2615, 1.6935] U 
[3.2610, 10.1562] U [10.2899, 38.0931] in the limit, see Figure 2. In Mestre's paper, 
it was shown that the ME performed very well by assuming all weight parameters 
(multiplicities) being known even if the splitting condition is not verified by the 
last two atoms. 

In the viewpoint of the LME method, the PSD H can only be divided into 
three parts: Hi = 0.55i, H2 = 0.25^7, and H3 = 0.1255i5 + 0.125(^25. Thus, the 
true partition of H is ko = (1, 1, 2). Table 1 presents the frequency of estimates 
of the partition k. The results show that the true model can be identified with 
an accuracy of 100% when the sample size n is larger than 200, and the accuracy 
decreases as n goes smaller. 

Table 2 presents statistics for the three estimators of H. The first six rows 
are results assuming all the weights {wi} are known, while in the last four rows 
are results assuming only {wi,W2} are known and W3 is to be estimated {104 is 
determined by = 1). Overall, the LME is as good as the ME when all 

weights are known, and is much better than the BCY in all cases. When W3 
is unknown, the problem is harder resulting larger distance values of d for both 
methods LME and BCY. This difficulty is also reflected by larger variances of 
the estimates of 03 and 04 which are closely related to the parameter ws (and 
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Figure 2: The density curve of F (left) and the average of the i-th {i — 1, . . . , 320) sample 
eigenvalues (right) from 1000 rephcations for H = 0.56i + 0.25Sj + 0.1255i5 + 0.125(525 
and c = 0.32. 



Table 1: Frequency of estimates for the partition of H: Hi = Q.5Si, H2 ~ 0.25(^7, 
i/s = 0.125(5i5 + 0.125(525 with p/n ^ 0.32. 



Dimensions 


k=(l,l,2)' 


k=(l,2,l)' 


k=(2,l,l)' 


(j),n) = (320,1000) 


1000 








(p,n) = (160,500) 


1000 








(p, n) = (64, 200) 


999 





1 


(p,n) = (32,100) 


896 


45 


59 


{p,n) = (16,50) 


623 


169 


208 



Wi). Concerning the execution time shown in the table, the BCY is the fastest 
followed by the ME, and then by the LME. However, the elapsed time of the BCY 
estimation increases rapidly as the number of unknown parameters increases. 

It should be noticed that in general when the splitting condition is not sat- 
isfied, the performance of the ME may decrease sharply, and the estimates may 
suffer from large biases. Next, we show this phenomenon and also examine the 
performances the LME and the BCY in such situations. 

We consider a similar model where the third atom of H is set to be 20 
instead of 15 and other settings remain unchanged, that is, H = 0.55i +0.25(57 + 
0.125(520 + 0.125^25 and c = 0.32. The empirical and limiting distributions of 
sample eigenvalues are illustrated in Figure 3, where Sp = [0.2617, 1.6951] U 
[3.2916, 10.4557] U [12.3253, 39.2608]. 
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Table 2: Estimates for H = 0.5(5i + 0.25(57 + 0.125(5i5 + 0.125^25 with p = 320 and 
n = 1000. 







fll 


0.2 


03 


W3 


04 


d 


Time 


ME 


Mean 


1.0000 


7.0031 


14.9987 




25.0001 


0.0425 


0.533s 




St. D. 


0.0041 


0.0407 


0.1368 




0.1964 


0.0199 




LME 


Mean 


1.0000 


7.0060 


14.9533 




25.0381 


0.0447 


0.578s 




St. D. 


0.0040 


0.0401 


0.1371 




0.2033 


0.0205 




BCY 


Mean 


0.9924 


7.0387 


14.8968 




25.0658 


0.0887 


0.147s 




St. D. 


0.0189 


0.1204 


0.3027 




0.2312 


0.0554 




LME* 


Mean 


1.0000 


7.0027 


14.9935 


0.1259 


25.0772 


0.1136 


0.890s 




St. D. 


0.0040 


0.0401 


0.2398 


0.0059 


0.3520 


0.0662 




BCY* 


Mean 


1.0012 


6.9806 


15.1350 


0.1288 


25.1728 


0.2143 


0.710s 




St. D. 


0.0082 


0.0753 


0.5738 


0.0113 


0.4903 


0.1368 





Table 3: Frequency of estimates for the partition of H: Hi = Q.5Si, H2 ~ 0.25Sj, 
H3 = 0.125(520 + 0.125(525 with p/n = 0.32. 



Dimensions 


k=(l,l,2)' 


k=(l,2,l)' 


k=(2,l,l)' 


(p, n) = (320, 1000) 


1000 








{p,n) = (160,500) 


922 


28 


50 


(p, n) = (64, 200) 


595 


183 


222 


(p, n) = (32, 100) 


455 


267 


278 


(p,n) = (16,50) 


376 


260 


364 



Analogous statistics are summarized in Tables 3 and 4. The results in Table 
3 show that the estimation of the partition k is more difficult in this case, but 
its accuracy still achieves 100% with the sample size n = 1000. The statistics 
in Table 4 reveal that the estimators of 03 and 04 from the ME have a bias as 
large as 0.85 in average when all weight parameters are assumed known, while 
the LME and the BCY are unbiased in the same settings. On the other hand, it 
is again confirmed that the LME improves upon the BCY, especially when the 
weight parameters are partially unknown. 

Finally, we study a case where H = 0.55i + 0.25(53 + 0.1255i5 + 0.125^25 
and c = 0.32 to examine the performance of the GLME. The empirical and 
limiting distributions of sample eigenvalues are illustrated in Figure 4, where 
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0.6 r 



50 100 150 200 250 300 



Figure 3: The density curve of F (left) and the average of the i-th (i = 1, . . . , 320) sample 
eigenvalues (right) from 1000 replications for H = 0.5Si + 0.25^7 + 0.125520 + 0.125525 
and c = 0.32. 



Table 4: Estimates for H = 0.5(5i + 0.2557 + 0.125^20 + 0.125^25 with p = 320 and 
n = 1000. 







fli 


0-2 


as 


W3 


04 


d 


Time 


ME 


Mean 


1.0001 


6.9996 


19.1483 




25.8521 


0.2224 


0.533s 




St. D. 


0.0041 


0.0395 


0.1836 




0.2068 


0.0404 




LME 


Mean 


1.0000 


7.0006 


19.9157 




25.0811 


0.0620 


0.575s 




St. D. 


0.0040 


0.0391 


0.2404 




0.2631 


0.0341 




BCY 


Mean 


0.9965 


7.0090 


19.9028 




25.0874 


0.0875 


0.142s 




St. D. 


0.0126 


0.0692 


0.3456 




0.3155 


0.0516 




LME* 


Mean 


1.0000 


7.0003 


19.8739 


0.1282 


25.2896 


0.2588 


0.896s 




St. D. 


0.0039 


0.0390 


0.7883 


0.0342 


0.8857 


0.1464 




BCY* 


Mean 


0.9993 


6.9983 


19.8587 


0.1331 


25.4569 


0.3286 


0.865s 




St. D. 


0.0054 


0.0446 


1.2884 


0.0437 


1.0888 


0.1685 
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Figure 4: The density curve of F (left) and the average of the i-th (i = 1, . . . , 320) sample 
eigenvalues (right) from 1000 replications for H = 0.5(5i + 0.25^3 + 0. 125^15 + 0.125(525 
and c = 0.32. 

Sp = [0.2552, 1.6086] U [1.6609, 4.7592] U [9.1912, 37.6300]. With the used dimen- 
sions, the first two clusters of sample eigenvalues are too close to be identified, 
and we have to merge these two clusters into one to get the GLME of H (thus no 
weight parameters are known at all). For comparison, we also present the LME 
by assuming that we know the true separation Sp into three intervals (which 
is not seen from the data). 

Statistics in Table 5 show a perfect estimation of k with sample sizes n = 
500, 1000. Results in Table 6 demonstrate that the GMLE has a very good 
performance with only a slight reduction in estimation efficiency compared with 
the (impractical) LME. 

Note that the BCY becomes unstable for this model as, for example, the 
empirical moment equations defining the estimator often have no real solutions. 
A major reason is that the required estimates of the 6-th and 7-th moments of 
H have poor accuracy in such a situation. 

5. Conclusions and remarks 

This paper investigates the problem of estimating the population spectral 
distribution from the sample eigenvalues in large dimensional framework. A 
local moment estimation procedure is proposed, by considering the division of a 
discrete PSD H according to the separation of the LSD F. The new estimates 
are easy to compute and are proved to be consistent. 
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Tabic 5: Frequency of estimates for the partition of H: Hi = 0.5(5i + 0.25(53, = 
0.1255i5 + 0.125(525 with p/n = 0.32. 



Dimensions 


k=(2,2)' 


k=(l,3)' 


k=(3,l)' 


(p, n) = (320, 1000) 


1000 








{p, n) = (160, 500) 


1000 








(p, n) = (64, 200) 


984 





16 


(p,n) = (32,100) 


911 





89 


(p,n) = (16,50) 


865 





135 



Table 6: Estimates for H = 0.5^i + 0.25(^3 + 0.125^15 + 0.125(525 with p = 320 anti 
n = 1000. 







ai 


Wi 


a2 


W2 


as 


GLME 


Mean 


1.0015 


0.5015 


3.0089 


0.2485 


15.0133 




St. D. 


0.0080 


0.0043 


0.0270 


0.0043 


0.2243 


LME 


Mean 


1.0003 




2.9996 




15.0061 




St. D. 


0.0042 




0.0165 




0.2267 






W3 






d 


Time 


GLME 


Mean 


0.1265 


25.1109 


0.1235 


0.1188 


0.817s 




St. D. 


0.0058 


0.3361 


0.0058 


0.0639 




LME 


Mean 


0.1262 


25.1058 


0.1238 


0.1074 


0.820s 




St. D. 


0.0058 


0.3428 


0.0058 


0.0641 
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Our estimation procedure can be seen as an extension of the method in 
Mestre (2008). The extension mainly focus on two aspects: first, the asymptotic 
clusters of sample eigenvalues generated by different population eigenvalues are 
not necessarily separate, that is, we drop the splitting condition; second, we don't 
need to know the weight parameters beforehand. These improvements enable our 
approach to be applied successfully to more complex PSDs. 

At last, the proposed method is more efficient than that in Bai et al. (2010). 
This could be attributed to two facts: our estimator uses much lower moments 
of the PSD H (the highest order of the moments is 2max ki — \ used in the LME 
while it is 2A; — 1 used in the BCY); moreover, our estimator is localized, then 
more efficient by removing possible mixture effect brought by sample eigenvalues 
from different -ffj's. 

Appendix: Calculation of the contour integrals in Equation (3.2) 

The possible poles in (3.2) are sample eigenvalues and zeros of s„(ii) on the 
real line. Thus, the next step is to determine which poles fall within the i-th 
integration region Cj. 

Let V = min{p, n} and Ai < • • • < A^, be the nonzero sample eigenvalues. 
According to the main theorems in Bai and Silverstein (1999), these sample 
eigenvalues should form m separate clusters for all large p and n. Thus, with 
probability one, the i-th cluster of sample eigenvalues, denoted by Ai, falls within 
Ci for all large p and n. 

On the other hand, notice that s^{u) = is equivalent to X]i'=i — u) = 

n (except for p/n = 1, where the second equation would have an additional zero 
solution). Let /ii < • • • < /x^ be zeros of s„(m) (define /Ui = if p/n = 1), we 
have then 

< Al < /U2 • • • < ^t, < At,. 

Let Bi = {^i : /ij 7^ 0, Aj G Ai} (i = 1, . . . , m). From the proof of Lemma 1 
in Mestre (2008), we know that, with probability one, Bi falls within Ci for all 
large p and n. A representation of Ai^s, i?i's, and Cj's is shown in Figure A.l 
for a simple case. In order to differentiate between j4j's and -Bj's, the elements 
of j4j's are plotted on the line y = 0.05 and those of i?j's are plotted on the line 
y = -0.05. 
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Cn 



+ j; 
It 



Figure A.l: Representation of Ai, Bi, and Ci {i = 1,2) where H = O.'SSi + OASi + 
0.3(55, {c,p,n) = (0.1,100,1000), and Sp = [xi,x+] U [x^,x+] = [0.6127,1.2632] U 
[2.3484,7.4137]. 



Therefore, the contour integral in (3.2) is formulated (approximately) as 

_L / = E Res(/;„, A) + ^ Resifin,f^), (A.l) 

where fin{z) = z^^{z) / ^^{z). The residues in (A.l) can be obtained by some 
elementary calculations. Residues from Ai are simple: 

Res(/z„,A) = -A/(/ = l). 

Residues from Bi are listed below for Z = 1, . . . , 5: 



R'es(/;„,/u) 



1) , 

2) , 

3) , 

4) , 

5) . 



For larger order I, we may get an analytic expression of Res(/te,/x) from the 
following Mathematica code (here, the order of the moment is set to be 3): 

k = 3; * input the order of moment * 
f = ( z— mu) " k*z*D[ sn [ z ] , z ] / ( sn [ z ] ) " k ; 
D[f ,{z,k-l}]; 

D[%*sn [z] ^ (2k-l) ,{z , 2 k - 1 }] / . z->mu; 
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D[sn [z] , z] ' (2k-l)(k-l)!(2k-l)!/.z->mu; 
Simplify [%%/%,sn [mu]= = 0] 
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